Skip to content

Conversation

@lordnn
Copy link

@lordnn lordnn commented Feb 4, 2025

…code.

@lordnn lordnn changed the title AArch64 specific NEON-intrinsics was wrapped to macros. refactor: AArch64 specific NEON-intrinsics was wrapped to macros. Feb 4, 2025
@lordnn lordnn changed the title refactor: AArch64 specific NEON-intrinsics was wrapped to macros. refactor: AArch64 specific NEON intrinsics was wrapped to macros. Feb 6, 2025
@lordnn lordnn changed the title refactor: AArch64 specific NEON intrinsics was wrapped to macros. refactor: AArch64 specific NEON intrinsics was wrapped into macros. Feb 6, 2025
Copy link
Collaborator

@kpchoi kpchoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking the following comment is required.

{ // Row 0
int32x4_t sad_vector = vdupq_n_s32(0);
// Loop unrolling
#pragma GCC unroll 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this work on non-GCC compiler?

Copy link
Author

@lordnn lordnn Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works also on CLang and does nothing on MSVC, because of unknown pragma is not an error.
P.S. It may produce warning on MSVC, in dependency on warning level.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that the unrolling will not work properly on MSVC, and it can lead slow operation.

Copy link
Author

@lordnn lordnn Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to tell without measurement. Measure before Optimizing.

  1. Small cycle will not flush instruction cache, so it may works faster without unrolling.
  2. MSVC may unroll it without pragmas. Trust to compiler.
  3. Do you really use MSVC to compile ARM project?

@lordnn lordnn changed the title refactor: AArch64 specific NEON intrinsics was wrapped into macros. refactor: Some AArch64 specific NEON intrinsics was wrapped into macros. Feb 12, 2025
@lordnn
Copy link
Author

lordnn commented Feb 14, 2025

Now it can be compiled for ARMv7 target, I hope.

@kpchoi kpchoi requested a review from neergil February 17, 2025 08:20
@neergil
Copy link
Collaborator

neergil commented Feb 17, 2025

The PR 1) proposes using several Macros, and 2) fixes minor typos, to refactor the ARM implementation. Please provide the following details:
a) Whether the bit-exactness is fully-tested
b) How the performance compared with the existing code (unit test and overall encode/decode time on ARM device)
c) Whether the PR maintains a fully clean build (without warnings) for all platforms including MS Visual Studio
Please provide the above information so we can decide on reviewing in detail.

@lordnn
Copy link
Author

lordnn commented Feb 17, 2025

с) Currently muster brunch produce warnings during MSVC 2022 build. So, please clean it's first. And what's the "all platforms" you mention about? Please provide details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants